Polynomial superlevel set representation of the multistationarity region of chemical reaction networks

In this paper we introduce a new representation for the multistationarity region of a reaction network, using polynomial superlevel sets. The advantages of using this polynomial superlevel set representation over the already existing representations (cylindrical algebraic decompositions, numeric sampling, rectangular divisions) is discussed, and algorithms to compute this new representation are provided. The results are given for the general mathematical formalism of a parametric system of equations and so may be applied to other application domains.

more than one equilibrium. There are already many algorithms developed for answering the binary question of whether a system can exhibit multistationarity [1][2][3][4][5][6][7][8][9]. The input of these algorithms is a reaction network and the output is the confirmation or rejection of the possibility of exhibiting multistationary behavior.
In the case where multistationarity can exist it then becomes important to determine the parameters where the network has this behavior. There has been less progress in this direction in the literature to date: the present paper offers a promising new development for this problem.

Prior work
Reviewing the state of the art in the literature, we see one vein of work focused on specific reaction networks, with success following heuristic or manual calculations to find a suitable parameter which may not work in generality [10,11]. Then, in another vein of work the system of equations for finding equilibria are solved for many random points from the parameter space to approximate the region where the network is multistationary [12][13][14][15].
Recently in [16] a new approach to get a description of the multistationarity region is proposed. In this method one does not need to solve the system of equations to count the number of equilibriums. Instead one computes an integral to get the expected number of equilibriums when the parameters are following a random distribution. This method partitions the parameter region into subsets that are a Cartesian product of intervals, called hyperrectangles. By choosing the uniform distribution and computing the average number of equilibriums on these hyperrectangles, one can approximate the multistationarity region as a union of sub-hyperrectangles. While efficient and widely applicable, this list of hyperrectangles does not allow the reader much information or intuition about the geometry of this region, such as connectedness or convexity.

Contribution
In this work, we propose using polynomial superlevel sets to approximate the union of the hyperrectangles from [16] as a set that can be described by one polynomial. Polynomial superlevel sets are already employed to approximate semi-algebraic sets and have been used in control and robust filtering contexts, see [17,18]. The polynomial superlevel set representation we propose is a more compact representation of the region compared to a list of many hyperrectangles each described as a Cartesian product of intervals. Further, to check if a point belongs to the region given in this representation one can easily just evaluate the polynomial in this point. Further benefits of the polynomial superlevel set description of the region will be explored later.

Organization of the paper
The organization of this paper is as follows. The mathematical framework of reaction networks and the definition of the multistationarity region is given first followed by a section containing the notations regarding parametric functions and definitions of the sampling and the rectangular representations of the multistationarity region from [19,Section 2.4]. We then define polynomial superlevel sets formally and describe how one can algorithmically find a polynomial superlevel set representation of a set using the sampling and the rectangular representations. We demonstrate how to use it to find the polynomial superlevel set representation of the multistationarity region of a reaction network. In the final section we discuss methods that sometimes can speed up computation of the polynomial superlevel set representation by the help of bisecting algorithms and where possible, algorithms for computing the expected number of solutions independently of solving the system itself.

Notations
The cardinality of a set A is denoted by #(A) . Let x ∈ Z and n ∈ Z \ {0} . In this paper we define x modulo n to be n instead of 0 whenever x is a multiple of n. For a function f : A 1 → A 2 and a point u ∈ A 2 , the level set of f is denoted by L u (f ) and is defined as {x ∈ A 1 | f (x) = u} . For two points a = (a 1 , . . . , a n ) and b = (b 1 , . . . , b n ) in R n , the nota- For a subset S of a hyperrectangle B ⊆ R n , let Vol(S) denote the normalized volume of S with respect to B, i.e.
When a random vector X = (X 1 , . . . , X n ) is distributed by a uniform distribution on a set S ⊆ R n , we write X ∼ U (S) . If X is distributed by a normal distribution with mean µ ∈ R n and variance σ 2 ∈ R >0 , then we write X ∼ N (µ, σ 2 ) and we mean that X 1 , . . . , X n are identically and independently distributed by N (µ i , σ 2 ) . The expectation of g(X) when X is distributed by a probability distribution q is denoted by E g(X) | X ∼ q .

Computer information
All computations for the examples of this paper were done on a computer with the following information. Processor: Intel(R) Core(TM) i7-10850H CPU @2.70GHz 2.71 GHz. Installed memory (RAM): 64.0 GB (63.6 GB usable). System type: 64-bit Operating System, x64-based processor.
The software and programming languages used for the computations reported in this paper had the following version numbers: Maple 2021, Matlab R2021a, YALMIP, SeDuMi 1.3, Julia 1.6.2, MCKR 1.0.

Multistationarity region of chemical reaction networks
In this section, we introduce the concepts of reaction network theory that are needed throughout the rest of the paper, with the help of a simple gene regulatory network example.
One can think of a gene as a unit encoding information for the synthesis of a product such as a protein. First, a group of DNA binding proteins called transcription factors bind a region of the gene called promoter. Now an enzyme called RNA polymerase starts reading the gene and produces an RNA until it arrives in the terminator region of the gene. The process until here is called the transcription step. After transcription is completed, the resulting RNA leaves the nucleus (in eukaryotes) and reaches ribosomes. In ribosomes, the second step, called translation, gets started. Ribosomes assemble a  Figure 2], depicted here in Fig. 1a. There are three genes with proteins A, B, and C as their final products. Denote their concentrations at time t by [A](t), [B](t) and [C](t) respectively. The concentration of these proteins will not remain constant all the time, and we have an Ordinary Differential Equation (ODE) system describing the variation of the concentrations as time passes, see Fig. 1b. Each protein is degraded with a first-order kinetics with the reaction rate constants k A,d , k B,d and k C,d correspondingly. Protein A activates the expression of the second gene with Michaelis-Menten kinetics with the maximum rate k B,max and the Michaelis constant k −1 B,A . The third gene gets activated by both proteins A and B together with the product of two Michaelis-Menten kinetics, with maximum rate k C,max and Michaelis constants k −1 C,A and k −1 C,B . The first gene gets expressed by the rate k A,max in the absence of protein C, and protein C has an inhibitory effect on the expression of the first gene, captured by the denominator (1 + k A,C [C](t)) in the rate expression.
A solution to the system d[X i ](t) dt = 0 (where the X i s are A, B and C) is called an equilibrium of the ODE system. Since the concentration of the proteins can only be non-negative real numbers, the complex or negative real solutions are not relevant. Sometimes we may only consider the positive solutions, for example, if a total consumption of a protein is not possible or of no interest. Therefore by steady states we mean positive solutions to the system of equations d[X i ](t) dt = 0 . The equations in this system are called the steady state equations. Now we are ready to define a reaction network formally. A reaction network, or a network for short, is an ordered pair, N = (S, R) where S and R are two finite sets called the set of species and the set of reactions. In our example, S = {A, B, C} and R contains six reactions: three gene expressions and three protein degradations. To each network, an ODE is attached with concentration of the species as its variables and the constants of the reaction rate expressions as its parameters. In our example, we have 3 such variables and 10 parameters. To fix the notation assume S = {X 1 , . . . , X n } and that there are r constants involved in the reaction rate expressions. Then we use A network with an inflow (injection) or an outflow (extraction or degradation) for at least one of its species is called an open network. The network in Fig. 1a is an open network because of the presence of the degradation reactions. A network can also be fully or partially conserved.
Consider the simple single reaction network depicted in Fig. 2a. The system of its ODE equations is given in Fig. 2b. Because ẋ 1 + 2ẋ 3 = 0 , the linear combination x 1 + 2x 3 should be constant with respect to the time. Therefore there exists a positive constant T 1 such that the relation x 1 + 2x 3 = T 1 holds. Similarly there exist two other positive constants T 2 and T 3 such that x 1 + 2x 4 = T 2 and x 2 + 2x 3 = T 3 . The values of T 1 , T 2 and T 3 can be determined by the initial conditions of the ODE system. These linear invariants imply that three of the steady state equations are linearly redundant and can be replaced by these three linear invariants which are called conservation laws in CRN theory. The linear subspace determined by the conservation laws is called the stoichiometric compatibility class. For a more detailed definition of conservation laws see Definition 1 in [19,Chapter 2]. One should note that the trajectories of the ODE system are confined to stoichiometric compatibility classes. In this case, one only cares about the steady states in one stoichiometric compatibility class. Now we are ready to define the main concept of interest, multistationarity.

Definition 2.1
Consider a network with n species and replace redundant steady state equations by conservation laws if there exist any. Let k stands for the vector of constants of both the reaction rates and conservation laws and be of the size r. A network is called multistationary over B ⊆ R r if there exists a k ∈ B such that f k (x) = 0 has more than one solution in R n >0 .

Remark 2.2
i) One may also consider non-linear invariants such as first integrals as defined in [21,Definition 11]. ii) Note that we are not concerned with the choice of the kinetics such as mass-action, Michaelis-Menten, Hill function, power-law kinetics and S-systems, or the form of the steady state equations such as polynomial or rational functions. Therefore the results of this paper will remain valid and practical for a general reaction network.
From here on the word parameters also includes the constants of the conservation laws in addition to the reaction rate constants. To answer the question of whether a network is multistationary or not one can use one of many algorithms available in the literature, see [1-5, 7, 9] for a few examples. However, to partition the parameter space into two subsets, one consisting of the choices of parameters for which f k (x) has more than one solution and the other comprising those parameter choices for which f k (x) has at most one solution, is a more laborious task which we tackle in this paper.

Definition 2.3
Consider a reaction network with the setting and notation of Definition 2.1. The set is called the multistationarity region of the network.
The region B in Definition 2.3 represents the regions of scientific interest. It will usually be a hyperrectangle made by the inequality restrictions of the form k i,min < k i < k i,max for the parameters. This is because, for example the rate of expression of a gene can not be any arbitrary positive number but must be limited; or the constant of conservation laws may be limited from above due to the limitation of the materials in the lab.

Prior state-of-the-art for parametric systems of equations
Let f k : R n → R m be a parametric function with B ⊆ R r as its parameter region and u a point in R m . For each choice of the parameters k ⋆ ∈ B , the system f k ⋆ (x) = u is a nonparametric system of equations. One can solve this system and look at the cardinality of the solution set. For different choices of k ⋆ , this number can be different. Therefore we define a new function u f : B → Z ≥0 ∪ {∞} sending k ∈ B to # L u (f ) , i.e. the size of the level set of f k (the set of points in R n which f k maps to u). Now one can partition B into the union of level sets of the map u f . For a general form of f k (x) , finding L i (� u f ) is a hard question.

CAD with respect to discriminant variety
In the case where f k (x) ∈ R(k) [x] m and A and B are semi-algebraic sets there are a variety of tools which can be employed, see for example [22]. In the literature, the approach used most commonly (e.g. [10,23,24]) is a Cylindrical Algebraic Decomposition (CAD) computed with respect to the discriminant variety. For a full description of this technique we refer the reader to [25,26] or the short sketch of the main idea in [24, section 3]. Briefly: the discriminant variety of the system f k (x) with the domain and codomain restrictions on the semi-algebraic sets A and B is the solution set to a new set of (non-parametric) polynomial equations with k as its indeterminants. This new set of polynomials can be computed algorithmically for example using Gröbner bases and elimination theory. Then CAD with respect to the discriminant variety decomposes B into a finite number of connected semi-algebraic sets called cells. Each cell has intersection with only one L i (� u f ) and therefore L i (� u f ) can be expressed as union of a finite number of cells with an exact description of their boundaries.
As we see later in this section, in many cases one is only interested in open cells (i.e. only those cells which have full dimension [27]). A Maple package, RootFinding[Parametric] has implemented an algorithm to compute the open CAD with respect to the discriminant variety of a system of parametric polynomial equations and inequalities [28]. From here on in this paper by CAD we mean such an open CAD with respect to the discriminant variety.
Both the computation of the discriminant variety and the subsequent decomposition involve the use of algorithms with doubly exponential complexity which can cause problems. The number of cells in the decomposition will grow doubly exponentially in the number of parameters of f k (x) [29]; and even computation of the discriminant variety itself before any decomposition can be infeasible for moderate examples, see e.g. [24]. This makes CAD impractical for studying parametric systems of polynomial equations with more than a few variables and parameters.

Approximation by sampling
Another approach adopted by scientists is to solve the system f k (x) = u for many different choices of k ∈ B [12][13][14][15].
Mathematically speaking, this means that B is replaced by a finite set. Then each L i (� u f ) is expressed as a subset of this finite set. This approach hereafter is referred as the sampling representation approach. In contrast with the CAD approach which provides an exact description of L i (� u f ) , the sampling representation approach provides an approximation. Note that there are different ways to choose the sample parameter points for the sampling representation. One way is to arrange all points equally distanced like a grid, and another way is to randomly sample from a distribution such as the uniform distribution on B, which is the one used in this paper. For an example of a case where a sampling representation with grid-like parameter sampling is used see [10,.
Since we are motivated from the application, we should note that in a lab, it is usually not possible to design the experiment so that the parameter values are exactly the numbers that we decide. Therefore when the experiment is designed to have k = k ⋆ , what happens is that k is a point in a neighborhood of k ⋆ and not necessarily k ⋆ itself. This can happen for example because of errors coming from the measurement tools or the noise from the environment. In such cases picking a point close to the boundaries of L i (� u f ) could lead to a different result than what the experimentalist expects, if errors or noise push it over the boundary.

Rectangular representation
A different discretization can be done using a rectangular division of B. For example if B is a hyperrectangle [a, b] then a grid on B is achieved by dividing B along each axis to equal parts. Then for each sub-hyperrectangle of B in this rectangular division we assign the average of the number of solutions of f k (x) = u for several choices of k coming from the sub-hyperrectangle. This approach hereafter is referred as the rectangular representation approach. See Fig. 4 to compare the three approaches visually.

Example
Consider the gene regulatory network in [30, Figure 3B], depicted here in Fig. 3a with the ODE system in Fig. 3b. This network has one conservation law, x 1 + x 4 = k 8 . Therefore we consider the system of equations obtained by the first three steady state equations in the ODE system and the conservation law to study the multistationarity of this network. For illustration purpose we fix values of all parameters other than two so we can plot the multistationarity region in 2 dimensions. In [30, Figure 4] the reaction rate constants other than k 3 were fixed to the values listed below.
We fix the values of all parameters other than k 3 and k 8 to these values also.
(1) k 1 = 2.81, k 2 = 1, k 4 = 0.98, k 5 = 2.76, k 6 = 1.55, k 7 = 46.9. . 3 A bistable autoregulatory motif presented in [30, Figure 3B]. a X is a gene, P is a protein that can form a dimer PP and then binding to X. The gene X will get expressed and produce P in both forms X and XPP. Finally there is a degradation of P. b The ODE system of the gene regulatory network in a. The variables x 1 , x 2 , x 3 and x 4 are standing for the concentration of the species X, P, PP and XPP respectively Let B be the rectangle made by the constraints 0.0005 < k 3 < 0.001 and 0 < k 8 < 2 . Using the RootFinding[Parametric] package of Maple we get the exact description of the multistationarity region of the network, in 0.12 s, depicted in Fig. 4a.
A sampling representation of the multistationarity region is found by solving the system of the equations for 1000 points (k 3 , k 8 ) sampled from the uniform distribution on [(0.0005, 0), (0.001, 2)]. We used the vpasolve command from Matlab to solve the system numerically. The Matlab code to generate this sampling representation took 154 s to run, with the output visualised in Fig. 4b.
A rectangular representation is given by dividing [(0.0005, 0), (0.001, 2)] to 100 equal sub-rectangles and then solving the system for 10 points (k 3 , k 8 ) sampled from the uniform distribution on each sub-rectangles. The sub-rectangles are colored with respect to the average number of solutions. This computation also was done by Matlab and took 166 s, with the output visualised in Fig. 4c. Polynomial sublevel sets are defined similarly as in Definition 4.1 with the only difference the direction of the inequality. However, in this paper, we only focus on superlevel sets. For d ∈ Z ≥0 let P d denote the set of polynomials of total degree at most d. A sum of squares (SOS) polynomial of degree 2d is a polynomial p ∈ P 2d such that there exist p 1 , . . . , p m ∈ P d so that p = m i=1 p 2 i . We denote the set of SOS polynomials of degree at most 2d by 2d .  pair (B, K) where B ⊆ R n is a compact set and K ⊆ B a closed set, and given d ∈ N ; we call the polynomial superlevel set U(p) (with p being the polynomial found in Theorem 4.2) the PSS representation of K ⊆ B of degree d. When K is a semialgebraic set, one can find p d numerically using a minimization problem subject to some positivity constraints [18,Equation 13].

Polynomial superlevel set representation
By solving a similar optimization problem it is possible to find the PSS representation of K ⊆ B . Let d ∈ N . The goal is to find the coefficients of a polynomial of degree d such that B p(x)dx becomes minimum subject to some conditions. Before presenting the constraints, let us look at the target function. A polynomial p(x) of degree d can be written as α∈N n d c α x α . Here N n d is the set of α = (α 1 , . . . , α n ) ∈ Z n ≥0 such that n i=1 α i ≤ d . Now the integral can be simplified as below.
Since B x α dx are constant real numbers independent of the coefficients of the polynomial, the target function is a linear function on the coefficients of p(x) which are the variables of the optimization problem. Now let us look at the constraints. First of all p(x) has to be nonnegative on B. This can be enforced by letting where r = ⌊ d 2 ⌋ the largest integer less than or equal to d 2 . Secondly we need p(x) ≥ 1 on K or in other words p(x) − 1 ≥ 0 on K. This holds if and only if p(x) − 1 ≥ 0 on each K i . Therefore for every i = 1, . . . , m one more constraint of the shape (2) has to be added: Recall Definition 2.3: the multistationarity region of a network is in fact a superlevel set, U 2 (� 0 f ) . The goal is to find a PSS representation of the set U 2 (� 0 f ) . One way to accomplish this goal is to find a rectangular representation of the multistationarity region and then solve the above mentioned SOS optimization problem. The next example illustrates this idea. To tackle it we use a Matlab toolbox called YALMIP [31,32] which can receive an SOS optimization problem, process it and use other solvers to solve it. For the solver to be used by YALMIP, we chose SeDuMi [33].

Examples
We continue with the example from Fig. 3. Consider the rectangular representation of the multistationarity region of that example given in Fig. 4c. To find the PSS representation of this set, we let B = [(0.0005, 0), (0.0010, 2)] and K be the union of rectangles K i s such that their associated number is greater than or equal to 2. From the 100 sub-rectangles of B, 9 of them satisfy this condition. These sub-rectangles are colored orange in Fig. 5. We use the YALMIP and SeDuMi packages of Matlab to solve the SOS optimization discussed before this example. To report the computation time we add the two times reported in the output of YALMIP: the "yalmiptime" and "solvertime". It takes about 2 s to get the coefficients of the polynomial p of the PSS representation of degree 2. Figure 5 shows the plot of U(p). Unfortunately the same code does not produce a better approximation when we increase d, the degree of p, from 2 to 4, 8 or 16. The output from Matlab gives similar figures in these cases as Fig. 5.
Consider another gene regulatory example from [19,Chapter 2]. To avoid lengthening the text, we only reproduce the system of equations needed to study the multistationarity of the network: We fix all parameters other than k 7 and k 8 to the following values coming from Equation (3)  We reproduced the rectangular representation of the multistationarity region of this network by Matlab, as shown in Fig. 6a. From the 100 sub-rectangles in total, for 28 of them the average number of steady states is greater than or equal to 2. Using YALMIP and SeDuMi it took between 1 and 2 s to get the polynomials of the PSS representations of degrees 2, 4 and 8 represented in Fig. 6b-d respectively. For this example, the PSS approximation of degree 4 looks different than that of degree 2, but for degree 8 the plot looks similar to degree 4.

Advantages of PSS representation over rectangular representation
It is natural to ask why one should find a PSS representation of the multistationarity region using the rectangular representation given one already has the rectangular representation? Let B ⊆ R r be the parameter region of the form of a hyperrectangle, and K ⊆ B be the multistationarity region. In the rectangular representation we have In the PSS representation we have K ≃ U (p) where p is a polynomial of degree d.  1-When r ≥ 4 , plotting K is impossible. In order to save or show the rectangular representation one needs to use a matrix of size (m) × (2r) , where each row stands for one K i and the first r columns have the coordinates of the point a K i and the second r columns correspond to the coordinates of the point b K i . However for the PSS representation one needs to use only a vector of size entries are coefficients of the terms of degree i. The terms are ordered from smaller total degree to larger and for the terms of the same total degree we use the lexicographic order. 2-To test if a point k ⋆ ∈ B belongs to K using the rectangular representation one should check m conditions of the form k ⋆ ∈ K i which means verifying an inequality on each coordinate of the point, i.e. a K i ,j ≤ k ⋆ j ≤ b K i ,j . If one of the conditions k ⋆ ∈ K i is positive, then there is no need to check the rest, otherwise all should fail to conclude that k ⋆ ∈ K . However, using the PSS representation one needs to check only one condition of an evaluation form, i.e. p(k ⋆ ) ≥ 1. 3-Recall from the last paragraph of the "Approximation by sampling" Section explaining that parameters near the boundary of the multistationarity region are not suitable choices for an experimentalist. To check the distance of a point k ⋆ ∈ B \ K to the boundaries of K using the rectangular representation one should find distance of k ⋆ from boundaries of each K i and then taking the minimum. However, using the PSS representation, in both cases of k ⋆ ∈ B \ K or k ⋆ ∈ K , one just needs to find the distance of k ⋆ from the algebraic set defined by p(k) − 1 = 0 , for example by Lagrange multipliers, as in the next section.
To conclude, if r + d r is considerably smaller than 2mr, then storing the PSS representation instead of the initial rectangular representation will save memory without loosing information about the multistationarity region.

Approximating the distance of parameter point from the boundary
To illustrate how to approximate distance of a parameter point from the boundaries of the multistationarity region using a PSS representation we continue with the example from Fig. 3. Let p be the polynomial of degree 2 in two variables k 7 and k 8 corresponding to U(p) in Fig. 6b. We will approximate distance of the point k ⋆ = (0.08, 0.02) from the boundary of the multistationarity region by the distance of k ⋆ from the algebraic set defined by p(k 7 , k 8 ) − 1 = 0 . This question is equivalent to minimizing the Euclidean distance function of a point (k 7 , k 8 ) from the point k ⋆ subject to the constraint (k 7 , k 8 ) ∈ L 1 (p) . The target function is (k 7 − 0.08) 2 + (k 8 − 0.02) 2 which gets minimized if and only if (k 7 − 0.08) 2 + (k 8 − 0.02) 2 gets minimized. An elementary way to solve this minimization problem is to use the method of Lagrangian multipliers [34, Chapter 7, Theorem 1.13]. Define F (k 7 , k 8 , ) = (k 7 − 0.08) 2 + (k 8 − 0.02) 2 + p(k 7 , k 8 ) − 1 . Now we must find the critical points of F (k 7 , k 8 , ) . So we should solve the system of equations obtained by ∂F ∂k 7 = ∂F ∂k 8 = ∂F ∂ = 0 . It takes 0.167 s to solve this system of equations by the solve command in Maple. It has 4 solutions, from which 2 belong to the rectangle B = [(0, 0), (0.1, 0.1)] and the minimum of the target function is obtained at the point (0.04499222669, 0.04161251428). The distance of this point from k ⋆ is 0.04114176669.

Constructing a PSS representation from a sampling representation
It is not necessary to have a rectangular representation to get the PSS representation. Let B = [a B , b B ] be a hyperrectangle and K = {a (1) , . . . , a (m) } a finite set. Let d ∈ N , and the goal be to find the coefficients of a polynomial of degree d such that B p(x)dx becomes minimum subject to some conditions. We already saw that the target function is linear. The constraint p ≥ 1 on K can be enforced by p(a (i) ) ≥ 1 for every i, which are linear constraints. The positivity of p on B can be enforced by Eq. (2) or by adding a large enough number of random points from B and putting the constraint p(a) > 0 . The later idea makes the problem solvable by any common linear programming tool. However, here we still use Eq. (2).
Let us illustrate this with our ongoing example. Consider the sampling representation of the multistationarity region of the network of Example 3.4 given at Fig. 4b. To find the PSS representation of this set, we let B = [(0.0005, 0), (0.001, 2)] and K to be the set of points for which the system f k (x) = 0 had more than one positive solution. There are 1000 points from which 78 of them are parameter choices where the network has three steady states. Using the YALMIP package of Matlab, it takes less than a second to get the coefficients of each of the polynomials p of the PSS representation of degrees 2, 6 and 10. Figure 7a-c show the plots of U(p) for degrees 2, 6 and 10 respectively. The plot for degree 6 actually looks worse than the plot for degree 2 (further away from the actual result in Fig. 4a), although the one for degree 10 looks a little better. In all these cases YALMIP finished the computations with a message 'Numerical problems (SeDuMi)' indicating that the solver found the problem to be numerically ill-posed. Rescaling the parameter region of interest, B, to [(0, 0), (1, 1)] and then transforming the PSS polynomial back to the original B allows a better PSS approximations via YALMIP. For degrees 2 and 6 the numerical problem message is avoided but for degree 10 it remains. The results are shown in Fig. 7d-f.

Advantages of PSS representation over the sampling representation
Let us address why one should find a PSS representation of the multistationarity region using the sampling representation if one already has a sampling representation. Let B ⊆ R r be the parameter region of the form of a hyperrectangle, K ⊆ B be the multistationarity region. In the sampling representation we have {a (1) , . . . , a (m) } ⊆ K . In the PSS representation we have K ≃ U (p) where p is a polynomial of degree d.
1-When r ≥ 4 , plotting K is impossible. In order to save or show the sampling representation one needs to use a matrix of the size m × r , where each row stands for one point a (i) and the columns correspond to the coordinates of the points. However, for the PSS representation one needs to use a vector of the size r + d r , as explained in the first item of the "Advantages of PSS representation over rectangular representation" section1. 2-To test if a point k ⋆ ∈ B belongs to K using the sampling representation is not a straightforward task. However, using the PSS representation one needs to verify only one condition of the evaluation form, p(k ⋆ ) ≥ 1. 3-To compute the distance of a point k ⋆ ∈ B to the boundaries of K, using the sampling representation, if k ⋆ ∈ K , one should compute the distance of k ⋆ from each point in the sampling representation of K and then take the minimum. Using the PSS representation, whether k ∈ K or not, one just needs to find the distance of k ⋆ from the algebraic set defined by p(k) − 1 = 0.
In a typical example from CRN theory, r is usually much higher than 2, and therefore item 1 is really important. When r + d r is lower than rm, one can use less memory by saving the PSS representation instead of keeping all the points of the sampling representation in the memory. Further, as items 2 and 3 show, this will not cause a loss of information about the multistationarity region.

More involved example
We showed earlier that the PSS representation can be generated for examples with a higher number of parameters than two. There, we let all 8 parameters of the network in Fig. 3a to be free and found the PSS representation of degree two in 8 indeterminants. Now we bring another such example which also serves to emphasize Remark 2.2 item (ii): that to have a PSS representation of the multistationarity region, one does not need to have the right hand side functions of the ODE system to be of polynomial or even rational functions. Consider a gene expression system with 4 species X i , i = 1, . . . , 4 where these species can be m-RNA or protein molecules or other relevant factors, with the ODE system as in Fig. 8a which was introduced in [35, Figure 2]. As one can see the right hand side functions involve at least a square root, and as a result this system is not polynomial, or In [35] it is shown that for (β 2 , β 3 , β 4 ) = (5, 3, 12) , the network is bistable, with three steady states. Here we rename these three parameters by k i , i = 1, 2, 3 respectively, let them vary in the 3-dimensional cube B = [(3, 1, 10), (7,5,14)] , and look for the multistationary region. I.e. we seek the relation between these three parameters so that the number of steady states of the network remains the same. Substituting the values into Fig. 8a and letting dx i /dt = 0 , i = 1, . . . , 4 gives the following system of equations.
With a simple calculation one can see that the set of positive solutions to the system of Eq. (6) is in one to one correspondence with the the set of positive roots of the following univariate polynomial: (7) f (y) = 4 24k 2 1 k 2 k 3 3 y 73 − 144y 64 + 256 4 24k 2 1 k 2 k 3 3 y 9 − 12288. Fig. 8 A gene expression network with an S-system kinetics that allows Hill terms and exhibits multistationary behavior, taken from [35, Figure 2]. a The ODE system of the network. The set of species is {X 1 , X 2 , X 3 , X 4 } . Each equation dx i /dt consists of two terms responsible for production and degradation of the species X i . The right hand side functions are not necessarily of a polynomial type, or even a rational function form. b The sampling and PSS representation of the multistationarity region of this network when all the parameters other than β i , i = 2, 3, 4 are fixed to the values in (5) and the remaining three parameters renamed k i , i = 1, 2, 3 vary in the cube B = [(3, 1, 10), (7,5,14)] . The sampling representation consists of the orange color points and the PSS representation is the area between the two yellow colored surfaces For any positive root of the polynomial in (7), a positive solution for (6) is By Descartes' rule of signs it is clear that the polynomial in (7) has 1 or 3 positive roots counted by multiplicity for any choice of k i s. To find the sampling representation of the multistationarity region of our system we solved the equation f (y) = 0 for 1000 random choices of the parameters using Matlab, which took about 8 min. At 532 of these choices the polynomial had 3 positive roots. The sampling representation is shown in Fig. 8b, with the points where the system has 3 steady states shown as orange spheres. We then used the information from the sampling representation to compute the PSS representation of the multistationarity region of degree two, which took less than 2 s. The resulting polynomial is below, with coefficients given to 6 decimal places.
Therefore the PSS representation of the multistationarity region is the set of the points (k 1 , k 2 , k 3 ) ∈ B that satisfy p(k 1 , k 2 , k 3 ) ≥ 1 , which is the region between the two yellow surfaces in Fig. 8b. Note that the point (5,3,12) corresponding to the value examined in [35] is in the middle of this region.

Bisection algorithms for rectangular representation
As shown in the previous section, it is possible to get the PSS representation both from the sampling and the rectangular representations. If one needs to solve the system at random points and take an average in order to get the rectangular representation, then using the rectangular representation does not have much advantage over using the sampling representation for the purpose of finding the PSS representation. But in some cases it is possible to compute the average number of the solutions without solving the system. One such case is introduced in [16]. Instead of solving the system for many points, it is enough to compute one integral called the Kac-Rice integral. In this situation, if the computation of the integral is possible and faster than solving the system for many random points, then the rectangular representation can be preferred to the sampling representation. However, one still needs a computation per each sub-hyperrectangle of the rectangular representation and this number can grow by the number of parameters. If B ⊆ R r and we divide it along each axis to m equal parts, then the number of sub-hyperrectangles in the rectangular representation becomes m r . A different approach to build a rectangular decomposition is to use bisection algorithms instead of dividing B to equal sub-hyperrectangles. When using a bisection algorithm, (x 1 , x 2 , x 3 , x 4 ) = k 1 k 2 k 3 6 y 6 , k 2 k 3 6 y 12 , k 3 3 y 12 , y 16 .

Two possible number of solutions for a given parameter point
Let us simplify the question. There is a hyperrectangle B ⊆ R r and a function g : B → Z ≥0 ∪ {∞} which in our case is 0 f associated to a parametric function f k (x) . Let B be the set containing all sub-hyperrectangles of B. The goal is to express L i (g) or U 2 (g) as a union of sub-hyperrectangles of B. One of the shapes of multistationary networks, observed often for realistic models, are bistable networks with a folding type of bifurcation, such as in dual phosphorylation cycle [36].
In our settings these networks have one steady state for some choices of parameters and three 1 steady states for some other choices of the parameters and for a zero measure set of parameters in the boundary of the two regions it has two steady states. This was the case for the networks studied earlier: the LacI-TetR gene regulatory, and auto-regulatory motif networks. In such cases 0 f is almost always either 1 or 3. Going back to our question, motivated from application, assume Im(g) = {n 1 , n 2 } where n 1 n 2 . In this case for each S ∈ B one of the followings occurs.
(i) E g(k) | k ∼ U (S) = n 1 . This can happen if and only if for almost every k ∈ S , g(k) = n 1 .
(ii) E g(k) | k ∼ U (S) = n 2 This can happen if and only if for almost every k ∈ S , g(k) = n 2 .
(iii) E g(k) | k ∼ U (S) = α, n 1 � α � n 2 . This can happen if and only if S ∩ L n 1 (g) and S ∩ L n 2 (g) both are nonempty and of nonzero measure. The proof is straightforward by noting that Therefore one can compute E g(k) | k ∼ U (B) . Then if the answer is n 1 conclude that almost the whole B is a subset of L n 1 (g) and if the answer is n 2 , conclude that almost the whole B is a subset of L n 2 (g) . Otherwise we proceed by dividing B along only one axis into two equal sub-hyperrectangles. We continue in this fashion until each subhyperrectangle is inside L n 1 (g) or L n 2 (g) or a termination condition on the length of the edges of the sub-hyperrectangles is met. When the termination condition on the edges is obtained, we put the sub-hyperrectangle in L n i (g) if E(g(k)) on this sub-hyperrectangle is closer to n i . We refer to this approach as the two-value bisection search hereafter. This algorithm is formally written in Algorithm 1.
E g(k) | k ∼ U (K ) = n 1 Vol K ∩ L n 1 (g) + n 2 Vol K ∩ L n 2 (g) . 1 Two stable and one unstable, however we do not study stability of the steady states in this paper. A similar algorithm was presented in [16, Section 2]. The first difference is that the input to the bisection algorithm in [16] is not necessarily a system with two general number of solutions. The second difference is that the output, there, is not only the two lists L n 1 (g) and L n 2 (g) (compare Fig. 9d-f with [16, Figure 1c]).
If the length of the edges of B are of different scales, then it is better to replace the termination condition of Algorithm 1 with the following: min b S,j − a S,j b B,j − a B,j | 1 ≤ j ≤ r ≤ ǫ,